2022-11-17

The Basic Problem

Multilevel models are developed for the analysis of hierarchically structured data. A hierarchy consists of lower level observations nested within higher level(s).

Example:

  • Level 1: measurement at one time
  • Level 2: student
  • Level 3: class
  • Level 4: school
  • Level 5: district

The linear multilevel model

Special regression that is suitable for hierarchical data.

Difference to traditional regression:
More than one error term (1+ per level)

Let i be the index for Level 1 units (\(i\) = 1,…,\(n_j\)) and j be the index for Level 2 units (\(j\) = 1,…,\(J\)) then the DV \(Y_{ij}\) at Level 1 is explained by:

\(Y_{ij}=\alpha_j+\beta_jX_{ij}+\epsilon_{ij}\)

\(\alpha_j= \mu+\gamma Z_j + u_j\)
\(\beta_j=\theta+ \eta Z_j + v_j\)


where \(X_{ij}\) is a Level 1 variable, and \(Z_j\) is a Level 2 variable

Why we can’t just use normal regression.

  1. faithful to the data structure
  2. individuals within a group are similar (correlated) and therefor include less information than independent individuals (effective n is overestimated and SEs are too small)
  3. effects on different levels don’t necessarily need be the same

similar cases share information

Would the regression line look much different with just one point per group?

effects on different levels can differ

Another thing is different: CENTERING

2 Options:

  • grand mean centering
    • centering like in normal regression
    • linear transformation -> only intercept changes (model is equivalent)
  • group mean centering
    • subtract the individual’s group mean from the individual’s score
    • parameters change -> model is NOT equivalent
    • group means can be added as group predictors (to disentangle micro and macro level contributions)

When to center how?

This is a complex question (e.g., see Hofman & Gavin (1998), Paccagnella (2006), Enders & Tofighi (2007), and Hamaker & Grasman (2015) for discussions) with no easy answer.

  • Raw:
    • if one is interested in intercept and intercept variance when predictor is 0
  • Grand mean centering:
    • often used for higher level variable to facilitate interpretation
    • interest in a L2 predictor and want to control for L1 covariates
    • interest in interactions between L2 variables
  • Group mean centering:
    • purpose is disentangling effects on different levels (add group means as predictor)
    • if multilevel collinearity is high (e.g., student’s age at L1 and school level at L2)
    • L1 association between \(X\) and \(Y\) is of substantive interest
    • cross-level interactions and L1 interactions

Correct centering depends solely on your question! Use different centerings for different questions (even in 1 analysis block).

Let’s look at a practical example

We will use data from the article Determinants of healthcare worker turnover in intensive care units: A micro-macro multilevel analysis by Daouda, Hocine & Temime (2021).

Variables:

  • stress level … DV (nurse’s reported stress level)
  • support from colleagues … L1 IV (per nurse)
  • type of ICU … L2 IV (per unit)

To disentangle L1 and L2 effect of support from colleagues we will group mean center it and also use its group means as a predictor.

So, instead of support from colleagues we will use:

  • support_l1
  • support_l2

Further, support_l2 will be grand mean centered.

Inital exploratory analyses

Things you know from previous sessions:

  • skim through the data
  • univariate summaries (numerical + graphical)
  • bivariate summaries (numerical + graphical)
  • bivariate summaries faceted/grouped by higher level units

Example of bivariate summaries faceted/grouped by higher level units

Model Building Strategy

  • start simple (establish a baseline for evaluating larger models)
  • add covariates one at a time, level by level, maybe centering certain variables
  • finally, examine the random effects and variance components, beginning with a full set of error terms and then removing covariance terms and variance terms where advisable (for instance, when parameter estimates are failing to converge or producing impossible or unlikely values)

This strategy follows closely Raudenbush & Bryk (2002) approach.

Other strategies are possible. E.g., Diggle et al. (2002) begins with a saturated fixed effects model, determines variance components based on that, and then simplifies the fixed part of the model after fixing the random part.

Digression: Random vs. Fixed Effects

fixed effects

  • levels of a factor we want to draw inferences from
  • would not change in replications of the study
  • e.g., type of ICU

random effects

  • levels of a factor which is just a sample from a larger population of factor levels
  • not interested in drawing conclusions about specific levels
  • but, interested in accounting for the influence of the random effect
  • e.g., specific unit

An Initial Model: Random Intercepts

  • no predictors at either level
  • assess variation at each level
model_0 <- lmer(`stress level` ~ 
  1+(1|CodeService), REML=T, data=df_ICU)
## Linear mixed model fit by REML ['lmerMod']
## Formula: `stress level` ~ 1 + (1 | CodeService)
##    Data: df_ICU
## 
## REML criterion at convergence: 3126.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4204 -0.7003 -0.1003  0.6496  3.1226 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  CodeService (Intercept)  0.7933  0.8907  
##  Residual                21.7261  4.6611  
## Number of obs: 526, groups:  CodeService, 30
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  22.6626     0.2641    85.8
  • \(\hat{\alpha}_0=22.66\) = est. mean stress level across all units
  • \(\hat{\sigma}²=21.73\) = est. variance in within-units deviations
  • \(\hat{\sigma}_{u}^{2}=0.79\) = est. variance in between-units deviations

intraclass correlation coefficient (ICC)

\(\hat{\rho}=\frac{\hat{\sigma}_{u}^{2}}{\hat{\sigma}_{u}^{2}+\hat{\sigma}²}\) = 0.034

3.4% of the total stress level variability are attributable to differences among units.

As \(\rho\) approaches 0, nurses stress levels are essentially independent. The effective sample size is close to number of nurses. As \(\rho\) approaches 1, all stress levels of all nurses in one unit become equal. The effective sample size is close to number of units.

Digression: Model estimation

In multilevel models, parameters are estimated with likelihood-based methods (instead of OLS). The most common methods are:

  • maximum likelihood (ML)
  • restricted maximum likelihood (REML)

REML accounts for loss in degrees of freedom from fixed effects estimation, and provides an unbiased estimate of variance components. Therefore, it is preferable in most cases.

ML should be used if nested fixed effects models are being compared using a likelihood ratio test (nested random are fine with REML).

If you want to know more about this topic, then take a look at Singer & Willet (2003).

Random Slopes and Intercepts Model

model_1<-lmer(`stress level` ~ support_l1+
  (support_l1|CodeService), REML=T, data=df_ICU)
## Linear mixed model fit by REML ['lmerMod']
## Formula: `stress level` ~ support_l1 + (support_l1 | CodeService)
##    Data: df_ICU
## 
## REML criterion at convergence: 3126
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.48406 -0.69531 -0.07206  0.65423  3.10262 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  CodeService (Intercept)  0.8528  0.9235        
##              support_l1   0.1921  0.4383   -0.45
##  Residual                21.1581  4.5998        
## Number of obs: 526, groups:  CodeService, 30
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 22.657749   0.265841  85.231
## support_l1  -0.007473   0.140485  -0.053
## 
## Correlation of Fixed Effects:
##            (Intr)
## support_l1 -0.168
  • \(\hat{\alpha}_{0}=22.6\) … mean stress level at mean support_l1
  • \(\hat{\beta}_{0}=-0.007\) … mean decrease of stress level if support_l1 increases by 1
  • \(\hat{\sigma}^2=21.2\) … variance in within-unit deviations
  • \(\hat{\sigma}_{u}^{2}=0.85\) … variance in between-units deviations
  • \(\hat{\sigma}_{v}^{2}=0.19\) … variance in between-support_l1 deviations
  • \(\hat{\rho}_{uv}=-0.45\) … correlation of unit intercept and support_l1

Visualization of model_1

Digression: Parameter tests

Getting p-values in multilevel models is not trivial, because the exact distribution of the test statistics under the null hypothesis (no fixed effect) is unknown (Bates et al. 2015).

Approaches:

  • t-values (ratios of parameter estimates to estimated standard errors) with absolute value above 2 indicate significant evidence that a particular model parameter is different than 0
  • tests based on conservative assumptions, large-sample results, or approximate degrees of freedom for a t-distribution
  • bootstrap, e.g.,

Example: Parametric Bootstrap

The parametric bootstrap simulates bootstrap samples from the estimated distribution functions. That is, error terms and random effects are simulated from their estimated normal distributions and are combined into bootstrap samples via the fitted model equation.

boot_model_1 <- lmeresampler::bootstrap(
  model_1, .f = fixef, type = "parametric", B = 1000)
confint(boot_model_1, type = "perc")
## # A tibble: 2 × 6
##   term        estimate  lower  upper type  level
##   <chr>          <dbl>  <dbl>  <dbl> <chr> <dbl>
## 1 (Intercept) 22.7     22.1   23.2   perc   0.95
## 2 support_l1  -0.00747 -0.290  0.252 perc   0.95

Add level 2 predictor

model_2<-lmer(`stress level`~support_l1+support_l2+
  (support_l1|CodeService), REML=T, data=df_ICU)
## Linear mixed model fit by REML ['lmerMod']
## Formula: `stress level` ~ support_l1 + support_l2 + (support_l1 | CodeService)
##    Data: df_ICU
## 
## REML criterion at convergence: 3126
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.48287 -0.70088 -0.06052  0.65553  3.07384 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  CodeService (Intercept)  0.9010  0.9492        
##              support_l1   0.1916  0.4377   -0.37
##  Residual                21.1592  4.5999        
## Number of obs: 526, groups:  CodeService, 30
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 22.661044   0.269030  84.232
## support_l1  -0.008085   0.140481  -0.058
## support_l2  -0.146307   0.385898  -0.379
## 
## Correlation of Fixed Effects:
##            (Intr) sppr_1
## support_l1 -0.141       
## support_l2 -0.006  0.012

Like support_l1, increase of support_l2 slightly reduces the stress level (-0.15), however, both effects are not significant (not tested, but | t | is far away from 2)

Add second level 2 predictor

model_3 <- lmer(`stress level` ~ `type of ICU` + support_l1 + support_l2 +
  (support_l1 | CodeService), REML=T, data=df_ICU)

model_3_ML <- lmer(`stress level` ~ `type of ICU` + support_l1 + support_l2 +
  (support_l1 | CodeService), REML=F, data=df_ICU)
##  Groups      Name        Variance Std.Dev. Corr  
##  CodeService (Intercept)  0.59081 0.76864        
##              support_l1   0.19536 0.44200  -0.640
##  Residual                21.21546 4.60602
##  Number of Level Two groups =  30
##                   Estimate Std. Error     t value
## (Intercept)    22.72183488  0.4168353 54.51034544
## `type of ICU`1 -0.65742457  0.5796305 -1.13421329
## `type of ICU`2  0.93917067  0.6860464  1.36896077
## support_l1     -0.01408992  0.1409538 -0.09996129
## support_l2     -0.13764067  0.3798257 -0.36237851

Neither, ICU Type 1 or 2 seem to differ significantly from Type 0. But it seems like 1 and 2 could differ.

How would we test that?

Try to simplify model_3 by removing random slope

model_4 <- lmer(`stress level` ~ `type of ICU` + support_l1 + support_l2 +
  (1 | CodeService), REML=T, data=df_ICU)

model_4_ML <- lmer(`stress level` ~ `type of ICU` + support_l1 + support_l2 +
  (1 | CodeService), REML=F, data=df_ICU)
##  Groups      Name        Variance Std.Dev.
##  CodeService (Intercept)  0.50043 0.70741 
##  Residual                21.83045 4.67231
##  Number of Level Two groups =  30
##                   Estimate Std. Error    t value
## (Intercept)    22.90115381  0.4171505 54.8990222
## `type of ICU`1 -0.89621951  0.5847246 -1.5327207
## `type of ICU`2  0.59368625  0.6934261  0.8561637
## support_l1     -0.01950721  0.1132695 -0.1722194
## support_l2     -0.35370396  0.3830209 -0.9234586

Compare model_3 and model_4 (1/2)

Information criteria like AIC and BIC can be used to compare nested and not nested models (smaller is better).

Likelihood ratio tests can be used for nested models.

Are model_3 and model_4 nested?

anova(model_4, model_3, test='Chisq')
## refitting model(s) with ML (instead of REML)
## Data: df_ICU
## Models:
## model_4: `stress level` ~ `type of ICU` + support_l1 + support_l2 + (1 | CodeService)
## model_3: `stress level` ~ `type of ICU` + support_l1 + support_l2 + (support_l1 | CodeService)
##         npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## model_4    7 3133.1 3163.0 -1559.6   3119.1                     
## model_3    9 3134.7 3173.1 -1558.3   3116.7 2.4646  2     0.2916

Compare model_3 and model_4 (2/2)

Let’s try bootstrapped likelihood ratio test (BLRT).

b3 <- lmeresampler::bootstrap(model_3_ML, .f = logLik, type = "parametric", B = 1000)
b4 <- lmeresampler::bootstrap(model_4_ML, .f = logLik, type = "parametric", B = 1000)
lrt_b <- -2 * b4$replicates + 2 * b3$replicates
quantile(lrt_b, probs = c(.025, .975))
##      2.5%     97.5% 
## -81.85642  98.24528

CI does include 0 -> no model is significantly better.

-> ICs indicate better fit of model_4 and, although more parsimonious, it doesn’t fit the data significantly worse than model_3.

If your theory doesn’t emphasize the random slope, than just the random intercept seems to be sufficient.

Always pick the model that fits your theory.

Note: Theory is fixed before the analysis.

Homework (graded)

due till 2022-12-08

Thank you for your attention!

Next Time (2022-12-15):

Missing Data

Note: There will be just one “your pick”-session, because homework was more work than I have expected at the start of the semester.